-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-50887: Mask value using a masking config #227
Conversation
siddhant2001
commented
Sep 27, 2024
•
edited
Loading
edited
- Write tests and verify correctness
Test Results330 tests +5 330 ✅ +5 11s ⏱️ ±0s Results for commit d38e08c. ± Comparison against base commit e092fbf. This pull request removes 72 and adds 18 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
tenantScopedMaskingCriteria = [ | ||
{ | ||
"tenantId": "testTenant", | ||
"timeRangeAndMaskValues": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we can have multiple TimeRange conditions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each timeRange condition will have it's own masking values
@@ -94,6 +94,27 @@ service.config = { | |||
# ] | |||
# } | |||
# ] | |||
tenantScopedMaskingCriteria = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's comment it in main application.conf file here. As a default, there is no masking applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes done, you've reviewed an older commit
public class HandlerScopedMaskingConfig { | ||
private static final String TENANT_SCOPED_MASKS_CONFIG_KEY = "tenantScopedMaskingCriteria"; | ||
private final Map<String, List<MaskValuesForTimeRange>> tenantToMaskValuesMap; | ||
private HashMap<String, Boolean> shouldMaskAttribute = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is key in both these maps - shouldMaskAttribute
and maskedValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttributeId is the key to both.
Removing shouldMaskAttribute as it is not needed.
|
||
private static boolean isTimeRangeOverlap( | ||
MaskValuesForTimeRange timeRangeAndMasks, Instant queryStartTime, Instant queryEndTime) { | ||
boolean timeRangeOverlap = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be false, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following conditionals check for no overlap, i.e. they set the timeRangeOverlap to false. This statement is correct.
timeRangeOverlap = false; | ||
} | ||
|
||
Instant endTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getStartTimeMillis().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be getEndTimeMillis
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, is this what we are looking as function?
if (timeRangeAndMasks.getStartTimeMillis().isPresent()) {
Instant startTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getStartTimeMillis().get());
Instant endTimeInstant = Instant.ofEpochMilli(timeRangeAndMasks.getEndTimeMillis().get());
if (!(startTimeInstant.isAfter(queryEndTime) || endTimeInstant.isBefore(queryStartTime))) {
return true;
}
}
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed the condition.
} | ||
|
||
public void parseColumns(ExecutionContext executionContext) { | ||
shouldMaskAttribute.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the state is maintained per request, but we should only test against the timeRange condition.
Pre-compute using config:
- tenantId -> List of timeRanges
- timeRange -> set of attributes
- timeRange -> map(attributeId, maskValue)
During response processing:
- Check if any time range matches.
- Pick the first match (or should we apply UNION?).
- If UNION is used, and the same attribute is present in two time ranges with different mask values, which one should we consider? I guess any Should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I've done it.
In case of attribute in multiple time ranges, I choose any value.
return Observable.fromIterable(rowBuilderList) | ||
.map(Builder::build) | ||
// .map(row -> handlerScopedMaskingConfig.mask(row)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
# "tenantId": "testTenant", | ||
# "timeRangeAndMaskValues": [ | ||
# { | ||
# "startTimeMillis": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should take the timestamp as mandatory.
- if startTime or endTime missing -> log an warn stating that the filter will be ignored.
if (indexToLogicalName.containsKey(colIdx)) { | ||
return indexToLogicalName.get(colIdx); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove additional space.
@@ -149,4 +157,12 @@ String getDataFromRow(int rowIndex, String logicalName) { | |||
} | |||
return result; | |||
} | |||
|
|||
String getLogicalNameFromColIdx(Integer colIdx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Lets see if we can use Optional as return type.
@@ -58,6 +59,10 @@ public class PinotBasedRequestHandler implements RequestHandler { | |||
private static final String START_TIME_ATTRIBUTE_NAME_CONFIG_KEY = "startTimeAttributeName"; | |||
private static final String SLOW_QUERY_THRESHOLD_MS_CONFIG = "slowQueryThresholdMs"; | |||
|
|||
private static final String MASKED_VALUE = "*"; | |||
// This is how empty list is represented in Pinot | |||
private static final String PINOT_EMPTY_LIST = "[\"\"]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PINOT_EMPTY_LIST
-> ARRAY_TYPE_MASKED_VALUE
MASKED_VALUE
-> DEFAULT_MASKED_VALUE
public List<String> getMaskedAttributes(ExecutionContext executionContext) { | ||
String tenantId = executionContext.getTenantId(); | ||
List<String> maskedAttributes = new ArrayList<>(); | ||
// maskedValue.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this.
this.tenantId = config.getString(TENANT_ID_CONFIG_KEY); | ||
this.maskValues = | ||
config.getConfigList(TIME_RANGE_AND_MASK_VALUES_CONFIG_KEY).stream() | ||
.map(MaskValuesForTimeRange::new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter out the empty maskings
@@ -536,7 +549,11 @@ private void handleSelection( | |||
for (String logicalName : selectedAttributes) { | |||
// colVal will never be null. But getDataRow can throw a runtime exception if it failed | |||
// to retrieve data | |||
String colVal = resultAnalyzer.getDataFromRow(rowId, logicalName); | |||
String colVal = | |||
!maskedAttributes.contains(logicalName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : get rid of the ! and invert to simplify
…ry-service into mask-unbofuscated-cookies
@@ -493,21 +500,26 @@ private Filter rewriteLeafFilter( | |||
return queryFilter; | |||
} | |||
|
|||
Observable<Row> convert(ResultSetGroup resultSetGroup, LinkedHashSet<String> selectedAttributes) { | |||
Observable<Row> convert(ResultSetGroup resultSetGroup, ExecutionContext executionContext) { | |||
LinkedHashSet<String> selectedAttributes = executionContext.getSelectedColumns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this also inside resultSetGroup.getResultSetCount() > 0
?
for (int colIdx = 0; colIdx < resultSet.getColumnCount(); colIdx++) { | ||
for (int colIdx = 0, logicalColIdx = 0; | ||
colIdx < resultSet.getColumnCount(); | ||
colIdx++, logicalColIdx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need logicalColIdx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be multiple map fields. When creating the idxToLogical name map, a map field would only increment the counter by one. Here each map field increments colIdx by 2. That's why I have created a new variable, which is incremented only once even if we go through a maps 2 columns (key and value)
@@ -149,4 +158,8 @@ String getDataFromRow(int rowIndex, String logicalName) { | |||
} | |||
return result; | |||
} | |||
|
|||
Optional<String> getLogicalNameFromColIdx(Integer colIdx) { | |||
return Optional.ofNullable(indexToLogicalName.get(colIdx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On what scenario, can it be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it shouldn't be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM